Skip to content

Avoid repeated parser table decode and cut parse setup overhead (issue 630)#631

Merged
ratmice merged 2 commits intosoftdevteam:masterfrom
avityuk:issue-630-parse-performance
Apr 30, 2026
Merged

Avoid repeated parser table decode and cut parse setup overhead (issue 630)#631
ratmice merged 2 commits intosoftdevteam:masterfrom
avityuk:issue-630-parse-performance

Conversation

@avityuk
Copy link
Copy Markdown
Contributor

@avityuk avityuk commented Apr 28, 2026

While profiling a workload that parses many small inputs in a tight loop, I found two sources of avoidable per-parse overhead in lrpar.

First, generated parse() functions from lrpar_mod were calling _reconstitute(__GRM_DATA, __STABLE_DATA) on every invocation, which meant re-decoding the grammar and state table every time even though RTParserBuilder::new only borrows them. This change caches the reconstituted tables in generated code and reuses them across calls.

Second, there were a couple of smaller setup costs in lrpar::parser:

  • parser::token_cost was stored as Box<&dyn Fn(...)>, introducing a heap allocation around an already-borrowed callback on every parse.
  • parse_map and parse_actions collected the lexer iterator into Vec<Result<...>> and then walked it again to build the lexeme vector.

This PR removes those extra costs by:

  • caching generated parser tables behind a OnceLock.
  • introducing an opaque lrpar::ParserTables wrapper so generated code does not need to name lrtable types directly
  • storing token_cost as a borrowed callback rather than boxing it
  • collecting lexemes in one pass

Since the time spent in _reconstitute is proportional to grammar size, this change is particularly impactful there.

However, even on a very small grammar, such as calc_actions example, these changes brought a tight-loop parse benchmark down from roughly ~2.06 µs/parse to ~0.80 µs/parse.

Comment thread lrpar/src/lib/ctbuilder.rs Outdated
@ltratt
Copy link
Copy Markdown
Member

ltratt commented Apr 28, 2026

I have one easy comment: @ratmice does this look OK to you?

@ratmice
Copy link
Copy Markdown
Collaborator

ratmice commented Apr 28, 2026

I haven't yet had a chance to look, but I will try and have a gander this evening, in a couple of hours.

@ratmice
Copy link
Copy Markdown
Collaborator

ratmice commented Apr 28, 2026

Looks like what it says on the tin, so this looks OK to me, couldn't help but join the bikeshed a little though.

@avityuk avityuk force-pushed the issue-630-parse-performance branch 3 times, most recently from dcdfda0 to bdd2053 Compare April 30, 2026 02:56
@avityuk avityuk force-pushed the issue-630-parse-performance branch from bdd2053 to bf1870b Compare April 30, 2026 03:00
@ratmice ratmice added this pull request to the merge queue Apr 30, 2026
Merged via the queue into softdevteam:master with commit a50ddd5 Apr 30, 2026
2 checks passed
@ratmice ratmice linked an issue Apr 30, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generated parse() re-decodes grammar tables on every call

3 participants